Concordance Measure-based Feature Screening and Variable Selection
نویسندگان
چکیده
The C-statistic, measuring the rank concordance between predictors and outcomes, has become a standard metric of predictive accuracy and is therefore a natural criterion for variable screening and selection. However, as the C-statistic is a step function, its optimization requires brute-force search, prohibiting its direct usage in the presence of high-dimensional predictors. We develop a smoothed version of the C-statistic to facilitate variable screening and selection. Specifically, we propose a smoothed C-statistic sure screening (C-SS) method for screening ultrahighdimensional data, and a penalized C-statistic (PSC) variable selection method for regularized modeling based on the screening results. We have shown that these two coherent procedures form an integrated framework for screening and variable selection: the C-SS possesses the sure screening property, and the PSC possesses the oracle property. Specifically, the PSC achieves the oracle property if mn = o(n ), where mn is the cardinality of the set of predictors captured by the CSS. Our extensive simulations reveal that, compared to existing procedures, our proposal is more robust and efficient. Our procedure has been applied to analyze a multiple myeloma study, and has identified several novel genes that can predict patients response to treatment. 2 YUNBEI MA, YI LI, HUAZHEN LIN AND YI LI
منابع مشابه
Modeling and design of a diagnostic and screening algorithm based on hybrid feature selection-enabled linear support vector machine classification
Background: In the current study, a hybrid feature selection approach involving filter and wrapper methods is applied to some bioscience databases with various records, attributes and classes; hence, this strategy enjoys the advantages of both methods such as fast execution, generality, and accuracy. The purpose is diagnosing of the disease status and estimating of the patient survival. Method...
متن کاملA Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection
Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...
متن کاملH-BwoaSvm: A Hybrid Model for Classification and Feature Selection of Mammography Screening Behavior Data
Breast cancer is one of the most common cancer in the world. Early detection of cancers cause significantly reduce in morbidity rate and treatment costs. Mammography is a known effective diagnosis method of breast cancer. A way for mammography screening behavior identification is women's awareness evaluation for participating in mammography screening programs. Todays, intelligence systems could...
متن کاملA Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)
Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...
متن کاملFeature Selection Using Multi Objective Genetic Algorithm with Support Vector Machine
Different approaches have been proposed for feature selection to obtain suitable features subset among all features. These methods search feature space for feature subsets which satisfies some criteria or optimizes several objective functions. The objective functions are divided into two main groups: filter and wrapper methods. In filter methods, features subsets are selected due to some measu...
متن کامل